Identifying contrarian terms based on website content

ABSTRACT

A system and method for identifying contrarian terms is disclosed. The system and method includes identifying a plurality of websites describing a product or service, analyzing content in the plurality of websites, the content relating to the described product or service, identifying contrarian terms based on the website content using a processing circuit, wherein the contrarian terms comprise descriptions of a product or service, associating the identified contrarian terms with a category, within which the product or service has been categorized in at least one of the plurality of websites, and storing the contrarian terms and the associated category in a memory.

RELATED APPLICATION

The present application claims priority to and is a continuation of U.S.Non-provisional application Ser. No. 13/485,337, entitled “IdentifyingContrarian Terms Based on Website Content” and filed on May 31, 2012,which is incorporated herein by reference in its entirety for allpurposes.

BACKGROUND

Users of a content management system, such as advertisers, selectkeywords to associate with content, such as advertisements. Content isdisplayed on a client device when terms corresponding to the keywordsare entered into a search engine by a user of a client device or areassociated with a website accessed by a user of a client device.Determining which keywords to select for a particular content accountcan be a lengthy and daunting process filled with uncertainty as to theeffectiveness of selected keywords.

SUMMARY

One embodiment described herein relates to a method for identifyingcontrarian terms. The method includes identifying a plurality ofwebsites describing a product or service, analyzing content in theplurality of websites, the content relating to the described product orservice, and identifying contrarian terms based on the website contentusing a processing circuit, wherein the contrarian terms aredescriptions of a product or service. Furthermore, the method includesassociating the identified contrarian terms with a category, withinwhich the product or service has been categorized in at least one of theplurality of websites and storing the contrarian terms and theassociated category in a memory.

An additional embodiment described herein relates to a system foridentifying contrarian terms. The system includes at least one processorconfigured to execute a computer program stored in at least one memoryto identify a plurality of websites describing a product or service,analyze content in the plurality of websites, the content relating tothe described product or service, identify contrarian terms based onwebsite content wherein the contrarian terms comprise descriptions of aproduct or service, associate the identified contrarian terms with acategory within which the product or service has been categorized in atleast one of the plurality of websites, and store the contrarian termsand the associated category in memory.

BRIEF DESCRIPTION OF THE DRAWINGS

The details of one or more embodiments of the subject matter describedin this specification are set forth in the accompanying drawings and thedescription below. Other features, aspects, and advantages of thesubject matter will become apparent from the description, the drawingsand the claims.

FIG. 1 is a schematic diagram of a network environment comprising aclient-server architecture;

FIG. 2 is a schematic diagram of a website hierarchy accessible by anetwork;

FIG. 3 is an example of a web page included in a website hierarchyaccording to one embodiment;

FIG. 4 is another example of a web page included in a website hierarchyaccording to one embodiment;

FIG. 5 is a flow chart of a process for detecting contrarian terms fromwebsite data according to one embodiment; and

FIG. 6 is an example of a graphical user interface used to display keywords according to one embodiment.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

Before describing in detail the embodiments of the improved system andmethod, it should be observed that the invention includes, but is notlimited to a novel structural combination of data processing componentsand communications networks. Accordingly, the structure, methods,functions, control and arrangement of components and circuits have, forthe most part, been illustrated in the drawings by readilyunderstandable block representations and schematic diagrams, in ordernot to obscure the disclosure with structural details which will bereadily apparent to those skilled in the art, having the benefit of thedescription herein. Further, the invention is not limited to theparticular embodiments depicted in the illustrative diagrams, but shouldbe construed in accordance with the language in the claims.

Entities that sell products and services continually seek to improve theeffectiveness of their online content. Content platforms facilitateselection of content such as advertisements for display on web pagessuch as search engine result web pages or other website web pages, forexample. One way in which content may be selected for display is throughkeywords found in data associated with web pages or keywords enteredinto a search engine. According to one embodiment, a user of contentplatform 198, such as an advertiser, may select one or more keywords toassociate with content such as an advertisement, for example.Subsequently, when a user navigates to a website associated with akeyword selected by the advertiser, the advertiser's content may beserved and/or displayed to the user. Keywords selected by users such asadvertisers may be stored in a content account along with various otheruser settings in a content platform. Content platforms may includecontent or advertising servers, content management systems, and thelike. Once advertisers have selected a particular keyword, users whovisit web pages that contain that particular keyword or who enter thatparticular keyword into a search engine may be shown content asdetermined by the advertiser who selected the particular keyword.

Referring to FIG. 1, a schematic diagram of a network environment 100that includes a client-server architecture will be described. Networkenvironment 100 allows for various content distribution processesdescribed herein. Furthermore, network environment 100 enables varioussystems and methods described herein to identify, collect, store, andutilize contrarian terms to improve an advertisers ability to selectrelevant keywords for their content, such as an advertisement. Networkelement 124 may include a local area network (LAN), wide area network(WAN), a telephone network, such as the Public Switched TelephoneNetwork (PSTN), a wireless link, an intranet, the internet, orcombinations thereof. Client device 102 can include a processing circuit160, a memory 170, a network interface 180, a user input element 104 aswell as a display 150. Display 150 is in electronic communication withone or more processors that cause visual indicia to be provided ondisplay 150. Display 150 may be located inside or outside of the housingof the one or more processors. For example, display 150 may be externalto a desktop computer (e.g., display 150 may be a monitor), may be atelevision set, or may be any other stand-alone form of electronicdisplay. In other examples, display 150 may be internal to a laptopcomputer, mobile device, or other computing device with an integrateddisplay.

In general, a client device 102 may be any type of processor-basedplatform that is connected to a network 100 and that interacts with oneor more applications. The display 150 includes a browser window 106which is provided on display 150 as a result of running a browsersoftware application on client device 102, such as the Google Chrome webbrowser. The browser window 106 displays content such as web pages fromvarious websites to facilitate user interaction with the web pages.These websites may include websites selling products and/or services,for example. Websites that offer products and/or services may beidentified by website source code such as HTML identifiers, for example.Furthermore, websites generally available to network 124 may includeinstructions in the form of computer code embodied in any suitablecomputer-programming language, such as, but not limited to, C, C++, C#,Go, Java, JavaScript, Perl, Python and Visual Basic. This computer codemay be stored on a network accessible computing device such as a websiteserver 146.

Website server 146 may provide a web page to client 102, in response toreceiving a request for a web page from client 102. In some embodiments,computer code from content platform 198 previously inserted into a webpage being requested by a client 102 will request content from a contentor advertising server 140 to be displayed with the requested web page.Content platform 198 may similarly provide content for display withsearch engine results. Content platform 198 may select an advertisementto display to a client device 102, based on a search term entered into asearch engine 126 by a client device 102. In addition, content platform198 may select content to display to a client device 102, based onkeywords associated with a web page requested by a client device 102.According to one embodiment, the content of a requested web page orsearch engine query may be parsed for keywords by performing characterrecognition or other word recognition function to recognize keywords.Once a keyword in web page content or a search engine query isidentified, the keyword may be compared with the keywords selected ineach content account 196, stored at content platform 198, according toone embodiment. If a keyword match is detected, the content platform 198may select content, such as an advertisement associated with the keywordas determined by a particular user's content account. In addition tokeyword matching, content platform 198 may take into consideration otherfactors to determine which content is selected such as bids from users,content quality, predicted click through rate of the content, etc.

Accordingly, users such as advertisers attempt to select keywords thatare as closely related to their product or service as possible to reducecost and to provide content to users for whom the content is the mostrelevant (e.g. users most likely to become customers). According to oneembodiment, advertisers may interface with content platform 198 using acomputing device 102 to access content accounts 196. Advertisers mayupload content, set content settings, select keywords to associate withcontent, etc. within a content account 196. For example, if anadvertiser sells digital cameras, they may select the keywords “digitalcamera” for their content account and associate the keywords “digitalcamera” with a content related to a digital camera containing a link tothe advertiser's website. Subsequently, when a user enters the searchterm “digital camera” into a search engine 126, the content for adigital camera that the advertiser associated with the keywords “digitalcamera” may be displayed to that user in browser window 106. Forexample, if a client device 102 requests a web page 108 that isassociated with the keywords “digital camera,” content platform 198 mayselect content for a digital camera that an advertiser associated withthe keywords “digital camera.” The digital camera content may then bedisplayed to that user in a predetermined location of the webpage whenit is displayed in browser window 106.

According to one embodiment, content platform 198 aids users such asadvertisers in selecting keywords by suggesting keywords to theadvertiser. According to one embodiment, keyword suggestions may beprovided by a keyword suggestion module 162 stored in a memory ofcontent platform 198. Graphical user interface 600, as shown in FIG. 6shows one example of an interface for a keyword suggestion program.Advertisers may enter a word, phrase and/or category (e.g., at entryfields 602 and 604) related to their products and services into thekeyword suggestion software and receive back a list of keywordsuggestions 606. Each keyword suggestion in the list of keywordsuggestions 606 may be a single keyword or multiple keywords. Once thekeyword suggestions are displayed, the advertiser may select one or morekeywords to add to their content account to be associated withparticular content. The list of keyword suggestions may be ranked inorder of relevance to the entered word or phrase, cost to select thekeyword, popularity, effectiveness, etc., or a combination of multiplefactors.

Another way to suggest keywords to a user such as an advertiser is basedon using contrarian terms found on websites stored on website servers146. Generally, contrarian terms are descriptions of a product orservice that are mutually exclusive with respect to a particularcategory of product or service. Contrarian term descriptions may be aone word term or multiple word terms. For example, “men's jeans” and“women's jeans” may be contrarian term descriptions of a product withinthe category of apparel. According to one embodiment, these terms areconsidered mutually exclusive descriptions in that products offered on awebsite that are identified as men's jeans could not also be identifiedas women's jeans. According to one embodiment, terms such as “men's” and“women's” may be contrarian in particular categories such as jeans, butmay not be contrarian in general. For example, the terms “men's” and“women's” may not be considered contrarian with respect to the categoryof laptop computers. Furthermore, terms may be deemed contrarianaccording to the processes described herein without being consideredmutually exclusive in a grammatical or logical context. For example, ifa large number of websites organize their website hierarchy such that“underwater” cameras and “child” cameras may be considered distinctproducts within the “camera” category, these terms may be consideredcontrarian despite the fact that a small number of websites may offerunderwater child cameras.

According to various embodiments described herein, website hierarchiescan be analyzed to identify and store these contrarian terms. Providingthe stored contrarian terms to a keyword suggestion module 162 allowsthe program to provide a more accurate and tailored list of keywordsuggestions to an advertiser. For example, when a word or phrase isentered by an advertiser into the keyword suggestion interface 600, thekeyword suggestion software may compare the word or phrase with a listof keyword suggestions that are contrary to the entered word or phrase.According to one embodiment, all keyword suggestions that are contraryto the entered word or phrase may be demoted or removed from the list ofkeyword suggestions 606 provided to the advertiser so that theadvertiser is less likely to select a keyword suggestion that isirrelevant to the advertiser's products and/or services.

Website hierarchies, website navigation paths, and web page computercode may all be used to determine which keywords found on a web page areconsidered contrarian. Website publishers use common techniques todisplay products and/or services on web pages so that users of thewebsite can navigate to products and/or services in an intuitive andstraight forward manner. For example, websites may organize products andservices into a website hierarchy such that web pages on the lowestlevel of a website hierarchy, known as leaf pages (much like a leaf on atree) are devoted to a product or service that is contrarian to aproduct or service found in other leaf pages. In addition, websitesregularly organize products and services into lists such that productsand services in the same list are contrarian to each other. For example,in many cases websites display a list of products or services related toa particular category. For example, a website may display a list ofdigital camera brands, each of which is mutually exclusive from theother because no two digital cameras can be made by two different brandsof camera. These leaf pages and lists may be identified by variousmethods described in further detail below.

One example of a website hierarchy 200 is shown in FIG. 2. The websitehierarchy 200 is the framework by which the content of the website ispresented. A website hierarchy 200 may predefine how users are able tonavigate through a website. Each node 204, 206, 208, 210, 212, etc. inhierarchy 200 represents a web page containing content. The content mayinclude information related to a product and/or service as well as oneor more links to other web pages that a user may navigate to, asdetermined by the website hierarchy.

Website code stored on servers 140 or 146 may contain data thatorganizes websites in a hierarchy 200 as shown by elements in FIG. 2.This website code can be accessed by a contrarian term detection processstored in the memory of a computing device at content platform 198, suchas memory module 132, to identify, confirm, retrieve, store, and/orutilize contrarian terms located in the website computer code, accordingto one embodiment. In general, detecting contrarian terms in a websitehierarchy 200 may include using a computer code parser to identify leafpages and product or service lists found in website hierarchy 200. Thecode parser may read computer code tags in web page content, URLaddresses, or a website file directory, for example, to identify leafpages and lists found in website hierarchy 200. One example of acontrarian term detection process 500 is shown in FIG. 5, described infurther detail below. According to one embodiment, process 500 is aniterative process that is carried out for one website and is repeatedfor a series of websites such as a predetermined number of websites, forexample. According to one embodiment, the same contrarian terms must bedetected at a predetermined number of websites before being confirmed ascontrarian terms that are used in keyword suggestion module 162.

Referring again to website hierarchy 200 as shown in FIG. 2, a treestructure 200 is shown which represents a data structure stored in amemory (such as a database, file system, read only memory, etc.). Thedata structure may be stored in memory 182 of one or multiple websiteservers 146. Nodes such as 206, 208, and 210 represent individual webpages which may be related to a product or service category, accordingto one embodiment. A node identifier, such as a computer code tag, maybe stored in memory of a website server 146 for each node. The nodeidentifier may include a textual description of the node, dataindicating its relationship(s) to other nodes, etc. Nodes such as parentnode 212 may have child nodes such as child nodes 220, 222, and 224.Parent nodes may be nodes that are one level above and connected in thesame branch as another node, according to one embodiment. Child nodesmay be nodes that are one level below a node and are connected in thesame branch as that node, according to one embodiment. For example, node206, representing a web page devoted to electronics is the parent nodeof child nodes 212 and 226 because it is in level 270, above level 280and is connected in the same branch. Furthermore, node 212, devoted tocameras, is a parent node to child nodes 220, 222, and 224 because it isin level 280 above level 290 and is connected in the same branch. Inmany cases, website publishers design website hierarchies such thatparent nodes may be a category of a product or service while child nodesmay be sub-categories of the parent node category.

For example, with respect to the electronics web page 206 in level 270of hierarchy 200, the category of electronics products is refinedfurther in level 280 to describe a sub-category of electronics productin web page 212, cameras. Furthermore, in lower level 290, thesub-category of cameras is further refined into web pages devoted todifferent sub-categories of cameras including a “digital SLR” camerapage 220, an “underwater” camera page 222, and a “children's” camerapage 224. Because nodes 220, 222, and 224 do not have child nodes, thecontrarian term detection program 132 may identify these nodes as leafnodes, each containing a contrarian term with respect to one another.For example, because nodes 220, 222, and 224 may be all leaf nodeswithin the category of “cameras” as identified by parent node 212,keywords “Digital SLR,” “underwater,” and “Children's,” may all beconsidered contrarian terms with respect to one another within the“cameras” category.

One example of a leaf page 300 is shown in FIG. 3. As shown in FIG. 3, aleaf page 300 may contain data related to a single product or service,such as a particular camera 302. Leaf pages, such as web page 300, mayalso contain an identifiable description of the product or service 312,product reviews 316 and data related to the product or service 314including specification data and/or price data, for example. Contrarianterms in a given leaf page may be found in website computer codeidentifiers, such as HTML tags or JavaScript tags, for example. Ingeneral, computer code tags provide formatting or display instructionsto website computer code. In some cases, products and/or services areformatted or displayed in predictable ways that can be detected by awebsite code parser or compiler. For example, products and/or servicesfor sale may be identified by an HTML tag indicating the product and/orservice is linked to a shopping cart capable of carrying out a financialtransaction.

Some web pages, such as those in a higher level of the tree structure,may contain lists of products or services. One example of a web pagecontaining lists of products is web page 400 as shown in FIG. 4. Webpages in hierarchy 200 may include one or several product lists 414,416, 418, 420 and 422 as shown in web page 400, each product listrepresenting a category of products and including a number of productdescriptions. According to one embodiment, the product descriptionswithin a product list 414 may be considered to be contrarian termswithin the category “brand” 424. For example, in product list 414,potential contrarian terms include “Brand 1,” “Brand 2,” “Brand 3,”“Brand 4,” etc. These terms may be considered only potential contrarianterms until the contrarian nature of the terms have been confirmed by apredetermined number of websites.

Furthermore, the product terms in product lists 414, 416, 418, 420, and422 may only be contrarian with respect to a particular list category424, 426, 428, 430 and 432, such that the terms are not deemed to becontrarian unless associated with that particular list category. Forexample, although the terms “Brand 1” and “Brand 2” may be contrarian inthe context of a camera brand category 424, these terms may not beconsidered contrarian with respect to one another in general. Such listcategories may be identified by web page computer code, such as HTML orJavaScript tags. The list category associated with such contrarian termsmay be systematically determined according to process 500, described ingreater detail below, by determining a description for the product listcategory 424, 426, 428, 430, and 432 for various product lists 414, 416,418, 420, 422.

Referring to FIG. 5, a process 500 for mining contrarian terms andcontrarian term categories from a website is shown according to oneembodiment. Process 500 may be carried out by executable code stored inmemory such as contrarian term detection module 132 that is executed byprocessing unit 184 or multiple processing units, for example. Theexecutable code may be stored in multiple memory modules or a singlememory module. According to one embodiment, the process executed byprocessing unit 184 may first identify websites that contain datarelated to products or services, at step 502. For example, identifyingproduct or service web pages may be carried out by having the process500 read website computer code of network accessible web pages stored ata website server 146. Website computer code may provide identifiers ofproduct or service web pages through HTML tags, for example, at step502. According to one embodiment, process 500 uses parsing software,such as an HTML parser to identify product HTML tags.

According to this embodiment, if step 502 determines a website does notoffer a product or service, the website may be discarded and process 500may begin again for another website. Alternatively, if step 502determines a website offers a product or service, process 500 may moveto step 504. In some cases, step 502 may be skipped altogether andprocess 500 may begin at step 504. According to one embodiment, process500 uses multiple methods to identify potential contrarian terms storedwithin the website hierarchy 200. For example, contrarian terms storedin website computer code can be identified based on product and/orservice lists, or based on distinct web pages, or leaf pages, focused ona particular product or service within a website hierarchy. According toone embodiment, steps 504, 508 and 512 identify potential contrarianterms based on leaf pages, while steps 506, 510, and 514 identifypotential contrarian terms based on product and/or service lists.Although process 500 depicts these multiple methods to identifypotential contrarians terms as being performed in parallel, thesemethods can be performed in series and in any order. Alternatively, onlyone of these two approaches may be used.

At step 504, process 500 interprets the computer code of a website todetermine if the website hierarchy stored in memory includes any leafpages such as leaf page 300 shown in FIG. 3 and leaf pages 220, 222,224, 228, 230, 232, 234, 236 and 238 as shown in website hierarchy 200in FIG. 2, for example. According to one embodiment, the process 500identifies web pages on the lowest level 290 of hierarchy 200 such asweb pages 220, 222, 224, 228, 230, 232, 234, 236 and 238. According toanother embodiment, the leaf page detection process in step 504 comparesa database of website uniform resource locator (URL) addresses with thewebsite URL being analyzed in process 500. In many cases, websites willexplicitly indicate leaf pages in their website URL address. Forexample, the website example.com explicitly identifies leaf pages as anyfolder in the website hierarchy contained in the folder labeled “itm,”displayed as http://www.example.com/itm/ in a URL address bar of a webbrowser application. Accordingly, the URL for each website enteringprocess 500 can be compared with the stored database of URL patterns toidentify leaf pages of each website. According to one embodiment, atstep 504, process 500 may identify web pages that are devoted to asingle product and/or service regardless of hierarchy level. Accordingto yet another embodiment, only web pages that are both on the lowestlevel 290 of a website hierarchy 200 and are also devoted to a singleproduct and/or service may identified as leaf pages at step 504.Furthermore, leaf pages identified at step 504 may be used to identifylist pages at step 506, as some websites organize leaf pages and listpages in predictable sequences with respect to hierarchy 200.

At step 508, once a leaf page has been identified, potential contrarianterms within the leaf page, such as leaf page 300 as shown in FIG. 3,may be extracted from the website code. For example, leaf pages such asweb page 300 may also contain an identifiable description of the productor service 310. Step 508 may read the website computer code, such ascomputer code tags of page 300 to identify the potential contrarianterm. According to one embodiment, process 500 uses parsing software toread website computer code. In addition, process 500 may implementmultiple code specific parsers in various steps of process 500, such asan HTML parser and a Javascript parser, to interpret websites written invariant computer codes. According to one embodiment, each leaf page suchas leaf page 220, 222 and 224 will have a contrarian term identified atstep 508. For example, reading the computer code for page 220 mayestablish that page 220 is a product page for digital SLR cameras.Furthermore, step 506 may determine leaf page 222 contains data for anunderwater camera, and leaf page 224 contains data for a children'scamera. Accordingly, the terms “underwater,” “children's” and “digitalSLR” may be considered potential contrarian terms within the category ofcameras. In addition, at step 512, a category or classification for eachof the identified potential contrarian terms may be determined.

According to one embodiment, potential contrarian terms are associatedwith a category or class by using a bread crumb analyzer. According toone embodiment, a bread crumb analyzer reads computer code thatrepresents a navigation path. Bread crumbs or bread crumb trails aregenerally displayed on websites to allow a user to keep track of theirnavigation path within a website. For example, in FIG. 4, elements 404,406 and 408 constitute a bread crumb trail, or navigation path, thatcorresponds to website hierarchy 200 shown in FIG. 2. Likewise, in FIG.3, elements 304, 306, 308, and 310 constitute a bread crumb trail, ornavigation path, that corresponds to website hierarchy 200 shown in FIG.2. Accordingly, at step 512 of process 500, the process may read thenavigation path on web page 300 to determine that the contrarian term“digital SLR” contained in graphical interface element 310 is associatedwith the category of “cameras” contained in graphical interface element308. Specifically, due to the nature of displayed navigation paths shownon website pages, step 512 may determine the second to last graphicalinterface element 308 in a navigation path is the category associatedwith the contrarian term 310, according to one embodiment.

Referring to a second method of identifying contrarian terms used byprocess 500, step 506 identifies list pages, such as list page 400 shownin FIG. 4 according to one embodiment. List pages may include one orseveral product lists 414, 416, 418, 420 and 422 each associated with aproduct category 424, 426, 428, 430 and 432 as shown in web page 400,for example. These list pages or category pages may be identified atstep 506 by website computer code tags, a bread crumb analyzer, websitenavigation path or URL address database as described previously.According to one embodiment, step 506 looks for web pages that includesa list descendant category nodes, such as nodes 220, 222, and 224, whichdescend in hierarchy 200 from category 212. According to anotherembodiment, step 506 may read computer code tags that describe theformatting of descriptions as a being displayed as a list format.

According to one embodiment, the list of products and/or services may beextracted from the list page by first recognizing the data arrangementtypically resulting in a list format, such as the data arrangement shownin lists 414, 416, 418, 420 and 422, for example. According to oneembodiment, the data arrangement is recognized by process 500 by readingwebsite computer code tags. For example, HTML tags will determine thatlist 414 will be displayed in a single column of terms and that thecolumn will appear on the left hand side of the web page as shown in webpage 400. List 414 will appear in a HTML block, for example, such thatall terms within the HTML block contain relevant lists of contrarianterms. Once a list page such as list page 400 has been identified,potential contrarian terms as well as categories associated with thosecontrarian terms may be identified at steps 510 and 514, respectively.Additionally, step 510 may only extract lists that occur in a spatialposition within a web page, such as on the left hand side of a web page,where the majority of product and/or service lists may be included.According to one embodiment, the spatial position of the product and/orservice list is recognized by process 500 by reading website computercode tags. Furthermore, step 510 may only extract contrarian term liststhat contain a descriptive header such as “Brand” 424, “Megapixels” 426,“Condition” 428, “Type” 430, and “optical zoom” 432 as shown in FIG. 4.According to one embodiment, the spatial position and descriptive headerdata can be identified by process 500 using computer code tags.

According to one embodiment, the extracted descriptions of products orservices within a product list may be considered to be contrarian terms.Referring to FIG. 4, product list 414 contains a list of potentialcontrarian terms including “Brand 1,” “Brand 2,” “Brand 3,” “Brand 4,”etc. The terms in these product lists 414, 416, 418, 420, and 422 areonly contrarian with each other when associated with a category, such aslist heading “brand” 424, according to one embodiment. For example,although the terms “Brand 1” and “Brand 2” may be contrarian withrespect to the category “brand” 424, these terms may not be consideredcontrarian in general. Accordingly, at step 514, the contrarians termsextracted from product and/or service lists 414, 416, 418, 420, and 422may be associated with an identified category at step 514.

Referring again to process 500 as shown in FIG. 5, once potentialcontrarian terms are extracted from website content and are associatedwith an identified category according to a leaf page identificationprocess and/or a list page identification process, process 500 mayoptionally process the potential contrarian terms at step 516. Accordingto one embodiment, the potential contrarian terms may be normalizedprior to being stored at step 520. For example, the terms may bestemmed, processed to eliminate stop words, or otherwise processed priorto storage. Stop words are common language articles such as “the,” “of,”“on,” and “a” in the English language. Removing stop words can simplifysoftware processes, simplify terms, improve the accuracy of the process,and reduce the number of contrarian terms stored. With respect to wordstemming, software may accept potential contrarian term such as“boating” and output a word root such as “boat” at step 516. Accordingto one embodiment, word roots are words without any modifiers such asplurals, prefix, suffix, affix, etc. In addition, if common words appearbetween a pair of contrarian terms in a common category, these commonterms are eliminated, according to one embodiment.

For example, if the category is “shoes” and the extracted potentialcontrarian terms associated with that category include “men's shoes,”and “women's shoes,” the them “shoes” would be eliminated from bothcontrarian terms such that “men's” and “women's” would be the potentialcontrarian terms forwarded to step 518. Furthermore, according to oneembodiment, step 516 may process the category associated with theextracted contrarian terms using a web page classifier. In thisembodiment, at step 516, a web page classifier may compare the extractedcategories received from steps 512 and 514 to a uniform category orclassification structure stored in memory. Because each individualwebsite uses their own category structure and category terms, theidentified category structure can be converted into the uniformstructure stored in memory. The uniform category or classificationstructure may be stored in a server associated with an entity providingkeywords to advertisers. The categories extracted from a web page may becompared with terms in the uniform category structure to determine whichuniform category the extracted contrarian terms should be associatedwith. According to one embodiment, the extracted category is comparedwith all terms in the uniform category structure and the comparison thatresults in the highest average weighted result according to a softwarealgorithm is selected as the correct uniform category. For example, if“shoes” is identified as a category at step 514, “shoes” may beclassified under the uniform category term “footwear” rather than“shoes.” The software algorithm may take several factors into accountsuch as similarity of characters in the web page category and uniformcategory or contextual information included on the web page analyzed inprocess 500.

At step 518, the potential contrarian terms may be confirmed ascontrarian terms. According to one embodiment, process 500 comparescontrarian terms extracted from the website currently undergoing process500 with terms extracted from previously reviewed websites at step 518.If the extracted and/or processed term matches with a previously storedterm in a common category, the contrarian term may be confirmed as acontrarian term. Terms need not necessarily match characters exactly fora match to be determined. For example, two terms may be consideredmatching if a predetermined number or percentage of characters are incommon. According to another embodiment, the potential contrarian termis confirmed at step 518 if it is matched with a predetermined number ofpreviously stored terms in a common category. In some cases, step 518may be skipped, or processed terms may be deemed to be contrarian ifthere is insufficient prior data with which to compare the processedterms.

Once the contrarian terms are confirmed at step 518, at step 520, theterms and associated category may be stored in the memory of a computingdevice, such as contrarian term library 134 of server 126 (FIG. 1). Thecontrarian term library 134 may be used to compare previously storedcontrarian terms with newly extracted potential contrarian terms at step518. Once the contrarian terms have been stored in association with acategory at step 520, the resulting database may be used to provide datato a user in a graphical user interface. For example, the database ofcontrarian terms associated with a category may be used to providekeyword suggestions to advertisers desiring to bid on keywords from acontent platform, for example.

FIG. 6 provides one example of a graphical user interface 600, which maybe a web page, that transmits and receives contrarian term data from acontrarian term library stored in memory 134, for example. Web page 600may be a web page in a website associated with a content platform 198that offers content associated with search engine results. The websitemay be stored in the memory 182 of a server 140 or 146, for example.Graphical user interface 600 may be viewed on display 150 of a networkconnected device such as client device 102, for example. Client devices102 may include a processing circuit 160, a memory 170, a networkinterface 140, a user input element 106 as well as a display 150,according to one embodiment. Client device 102 may be used to view awebpage 108 stored on web server 126, 140 or 146, using a softwarebrowser application 106. According to one embodiment, a user such as anadvertiser may use graphical user interface 600 to select keywords theywould like to bid on.

According to one embodiment, graphical user interface 600 is aninterface for a keyword tool program. Using the keyword tool program, auser can receive keyword suggestions by entering the product or servicethey would like to advertise into graphical element 604 and the categoryof the product or service into graphical element 602. This input datacan be used to retrieve suggested keywords at graphical element 606.According to one embodiment, graphical user interface 600 accesses thecontrarian term data and associated category data stored in memory, suchas contrarian term library 134, to provide the suggested keywords ingraphical element 606. According to one embodiment, the order of thekeyword suggestion list in element 606 is determined by the degree ofsimilarity between the terms entered into graphical elements 604 and 602and the contrarian terms and associated category stored in contrarianterm library 134. For example, if process 500 has determined that theterms “men's” and “women's” are contrarian terms with respect to thecategory “shoes” and a user, such as an advertiser, enters the phrase“men's shoes” in graphical element 604 and the category “shoes” ingraphical element 602, the keyword term “women's shoes” will be demotedto a lower position in the keyword suggestion list 606. According to oneembodiment, web page and graphical user interface 600 may also be usedby an advertiser to access contrarian term library 134 to provide agroup suggestion for a series of contrarian terms such as “Men's,”“Women's.” For example, a user may submit a request into a graphicaluser interface element for the grouping associated with “Women's.” Webpage 600 may then connect to contrarian term library 134 and retrieveand display a broader group such as “gender” to the user. Furthermore,according to one embodiment, web page 600 may allow a user such as anadvertiser to purchase or bid on keyword combinations that include anentire group of contrarian terms. For example, web page 600 may providea list of keyword suggestions 606 that includes “gender” shoes inaddition to men's shoes and women's shows, according to one embodiment.

Referring again to FIG. 1, elements 126, 140, 144, and 198 may becomputing devices having a processor and a memory. The processingcircuit may include digital and/or analog electrical components (e.g., amicroprocessor, application-specific integrated circuit,microcontroller, or other digital logic) configured to perform thefunctions described herein. The processing circuit may be a singleserver computer or a plurality of server computers, and may operate in acloud computing environment, such as a shared, scalable computingenvironment. The memory includes storage media, which may be volatile ornon-volatile memory that includes, for example, read only memory (ROM),random access memory (RAM), magnetic disk storage media, optical storagemedia, flash memory devices and zip drives. The memory may store datafiles associated with particular websites in a database format.

Embodiments of the subject matter and the operations described in thisspecification can be implemented in digital electronic circuitry, or incomputer software embodied on a tangible medium, firmware, or hardware,including the structures disclosed in this specification and theirstructural equivalents, or in combinations of one or more of them.Embodiments of the subject matter described in this specification can beimplemented as one or more computer programs, i.e., one or more modulesof computer program instructions, encoded on one or more computerstorage medium for execution by, or to control the operation of, dataprocessing apparatus, such as a processing circuit. A processing circuitsuch as CPU 184 may include any digital and/or analog circuit componentsconfigured to perform the functions described herein, such as amicroprocessor, microcontroller, application-specific integratedcircuit, programmable logic, etc. Alternatively or in addition, theprogram instructions can be encoded on an artificially-generatedpropagated signal, e.g., a machine-generated electrical, optical, orelectromagnetic signal, that is generated to encode information fortransmission to suitable receiver apparatus for execution by a dataprocessing apparatus.

A computer storage medium can be, or be included in, a computer-readablestorage device, a computer-readable storage substrate, a random orserial access memory array or device, or a combination of one or more ofthem. Moreover, while a computer storage medium is not a propagatedsignal, a computer storage medium can be a source or destination ofcomputer program instructions encoded in an artificially-generatedpropagated signal. The computer storage medium can also be, or beincluded in, one or more separate components or media (e.g., multipleCDs, disks, or other storage devices). Furthermore, the computerreadable storage medium does not include a signal.

The operations described in this specification can be implemented asoperations performed by a data processing apparatus on data stored onone or more computer-readable storage devices or received from othersources. The term “data processing apparatus” or “computing device”encompasses all kinds of apparatus, devices, and machines for processingdata, including by way of example a programmable processor, a computer,a system on a chip, or multiple ones, or combinations, of the foregoingThe apparatus can include special purpose logic circuitry, e.g., an FPGA(field programmable gate array) or an ASIC (application-specificintegrated circuit). The apparatus can also include, in addition tohardware, code that creates an execution environment for the computerprogram in question, e.g., code that constitutes processor firmware, aprotocol stack, a database management system, an operating system, across-platform runtime environment, a virtual machine, or a combinationof one or more of them. The apparatus and execution environment canrealize various different computing model infrastructures, such as webservices, distributed computing and grid computing infrastructures.

A computer program (also known as a program, software, softwareapplication, script, or code) can be written in any form of programminglanguage, including compiled or interpreted languages, declarative orprocedural languages, and it can be deployed in any form, including as astand-alone program or as a module, component, subroutine, object, orother unit suitable for use in a computing environment. A computerprogram may, but need not, correspond to a file in a file system. Aprogram can be stored in a portion of a file that holds other programsor data (e.g., one or more scripts stored in a markup languagedocument), in a single file dedicated to the program in question, or inmultiple coordinated files (e.g., files that store one or more modules,sub-programs, or portions of code). A computer program can be deployedto be executed on one computer or on multiple computers that are locatedat one site or distributed across multiple sites and interconnected by acommunication network.

The processes and logic flows described in this specification can beperformed by one or more programmable processors executing one or morecomputer programs to perform actions by operating on input data andgenerating output. The processes and logic flows can also be performedby, and apparatus can also be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application-specific integrated circuit).

Processors suitable for the execution of a computer program include, byway of example, both general and special purpose microprocessors, andany one or more processors of any kind of digital computer. Generally, aprocessor will receive instructions and data from a read-only memory ora random access memory or both. The essential elements of a computer area processor for performing actions in accordance with instructions andone or more memory devices for storing instructions and data. Generally,a computer will also include, or be operatively coupled to receive datafrom or transfer data to, or both, one or more mass storage devices forstoring data, e.g., magnetic, magneto-optical disks, or optical disks.However, a computer need not have such devices. Moreover, a computer canbe embedded in another device, e.g., a mobile telephone, a personaldigital assistant (PDA), a mobile audio or video player, a game console,a Global Positioning System (GPS) receiver, or a portable storage device(e.g., a universal serial bus (USB) flash drive), to name just a few.Devices suitable for storing computer program instructions and datainclude all forms of non-volatile memory, media and memory devices,including by way of example semiconductor memory devices, e.g., EPROM,EEPROM, and flash memory devices; magnetic disks, e.g., internal harddisks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROMdisks. The processor and the memory can be supplemented by, orincorporated in, special purpose logic circuitry.

To provide for interaction with a user, embodiments of the subjectmatter described in this specification can be implemented on a computerhaving a display device, e.g., a CRT (cathode ray tube) or LCD (liquidcrystal display) monitor, for displaying information to the user and akeyboard and a pointing device, e.g., a mouse or a trackball, by whichthe user can provide input to the computer. Other kinds of devices canbe used to provide for interaction with a user as well; for example,feedback provided to the user can be any form of sensory feedback, e.g.,visual feedback, auditory feedback, or tactile feedback; and input fromthe user can be received in any form, including acoustic, speech, ortactile input. In addition, a computer can interact with a user bysending documents to and receiving documents from a device that is usedby the user; for example, by sending web pages to a web browser on auser's client device in response to requests received from the webbrowser.

Embodiments of the subject matter described in this specification can beimplemented in a computing system that includes a back-end component,e.g., as a data server, or that includes a middleware component, e.g.,an application server, or that includes a front-end component, e.g., aclient computer having a graphical user interface or a Web browserthrough which a user can interact with an implementation of the subjectmatter described in this specification, or any combination of one ormore such back-end, middleware, or front-end components. The componentsof the system can be interconnected by any form or medium of digitaldata communication, e.g., a communication network. Examples ofcommunication networks include a local area network (“LAN”) and a widearea network (“WAN”), an inter-network (e.g., the Internet), andpeer-to-peer networks (e.g., ad hoc peer-to-peer networks).

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other. In someembodiments, a server transmits data (e.g., an HTML page) to a clientdevice (e.g., for purposes of displaying data to and receiving userinput from a user interacting with the client device). Data generated atthe client device (e.g., a result of the user interaction) can bereceived from the client device at the server.

While this specification contains many specific implementation details,these should not be construed as limitations on the scope of anyinventions or of what may be claimed, but rather as descriptions offeatures specific to particular embodiments of particular inventions.Certain features that are described in this specification in the contextof separate embodiments can also be implemented in combination in asingle embodiment. Conversely, various features that are described inthe context of a single embodiment can also be implemented in multipleembodiments separately or in any suitable subcombination. Moreover,although features may be described above as acting in certaincombinations and even initially claimed as such, one or more featuresfrom a claimed combination can in some cases be excised from thecombination, and the claimed combination may be directed to asubcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. In certain circumstances, multitasking and parallel processingmay be advantageous. Moreover, the separation of various systemcomponents in the embodiments described above should not be understoodas requiring such separation in all embodiments, and it should beunderstood that the described program components and systems cangenerally be integrated together in a single software product orpackaged into multiple software products.

Thus, particular embodiments of the subject matter have been described.Other embodiments are within the scope of the following claims. In somecases, the actions recited in the claims can be performed in a differentorder and still achieve desirable results. In addition, the processesdepicted in the accompanying figures do not necessarily require theparticular order shown, or sequential order, to achieve desirableresults. In certain implementations, multitasking and parallelprocessing may be advantageous.

What is claimed is:
 1. A method of identifying contrarian termscomprising: receiving, using one or more data processors, data for aplurality of websites; determining, using one or more data processors, aset of websites from the plurality of websites that describe one or moreproducts or services based on one or more tags associated with each ofthe set of websites; extracting, for each of the set of websites by oneor more data processors, a set of contrarian terms, wherein extractingthe set of contrarian terms comprises: extracting a first term and asecond term from a website of the set of websites; identifying acategory for the first term based on analyzed content for the website ofthe set of websites; identifying the category for the second term basedon analyzed content for the website of the set of websites; determiningthe first term and the second term are contrarian terms for the set ofcontrarian terms based on the category and a first description of aproduct or service associated with the first term and a seconddescription of a product or service associated with the second term; andstoring the set of contrarian terms and the identified category in amemory.
 2. The method of claim 1, wherein identifying the category thefirst term comprises: determining, using one or more data processors, anavigation path associated with a web page for a product or serviceassociated with the first term for the website of the set of websites,and identifying, using one or more data processors, the category for thefirst term based on a prior element of the determined navigation path.3. The method of claim 1 further comprising: analyzing, using one ormore data processors, content of each of the set of websites byidentifying a website organizational structure of each of the set ofwebsites, the website organizational structure comprising datarepresenting a plurality of web pages of each website and datarepresenting relationships of each web page of the plurality of webpages to one or more other web pages of the plurality of web pages. 4.The method of claim 3, wherein identifying a website organizationalstructure comprises identification of a plurality of leaf web pages oridentification of a list web page.
 5. The method of claim 4, wherein anidentified leaf web page includes a description of a product or service,wherein extracting a contrarian term of the set of contrarian terms isbased on the description of the product or service of the leaf web page.6. The method of claim 4, wherein an identified list web page comprisesa product list associated with a product category, wherein extracting acontrarian term of the set of contrarian terms is based on the productlist of the identified list web page.
 7. The method of claim 6, whereinidentifying the category for the extracted contrarian term based on theproduct list is based on the associated product category.
 8. The methodof claim 1, further comprising: normalizing, using one or more dataprocessors, each extracted contrarian term of the set of contrarianterms by word stemming or eliminating stop words.
 9. The method of claim1, further comprising: comparing, using one or more data processors, asimilarity of an identified category for each contrarian term of the setof contrarian terms to a uniform category structure.
 10. The method ofclaim 1, further comprising: generating, using one or more dataprocessors, display data representing a list of suggested keywords,wherein the order of the displayed list of suggested keywords is basedon the set of contrarian terms and identified categories for the set ofcontrarian terms.
 11. The method of claim 10, wherein the list ofsuggested keywords are generated in response to a user entering akeyword into a graphical user interface.
 12. A system for servingcontent items comprising: one or more data processors; and one or morestorage devices storing instructions that, when executed by the one ormore data processors, cause the one or more data processors to performoperations comprising: identifying a plurality of leaf web pages of awebsite or a list web page of a website, extracting a set of contrarianterms from the plurality of identified leaf web pages or the identifiedlist web page, wherein extracting the set of contrarian terms comprises:extracting a first term and a second term from the plurality ofidentified leaf web pages or the identified list web page; identifying acategory for the first term based on analyzed content for the pluralityof identified leaf web pages or the identified list web page;identifying the category for the second term based on analyzed contentfor the website of the set of websites; and determining the first termand the second term are contrarian terms for the set of contrarian termsbased on the category and a first description of a product or serviceassociated with the first term and a second description of a product orservice associated with the second term; and storing the set ofcontrarian terms and the category in a memory.
 13. The system of claim12, wherein identifying the plurality of leaf web pages or the list webpage is based on parsing one or more tags associated with the website.14. The system of claim 12, wherein identifying the category for thefirst term is based on a common parent web page of the plurality of leafweb pages.
 15. The system of claim 12, wherein identifying the categoryfor the first term comprises: determining a navigation path associatedwith a web page for a product or service associated with the first term,and identifying the category for the first term based on a prior elementof the determined navigation path.
 16. The system of claim 12, whereinthe identified list web page comprises a product list and a productcategory, wherein extracting the set of contrarian terms is based on theproduct list, and wherein identifying the category for first term isbased on the product category.
 17. The system of claim 12, wherein theone or more storage devices store instructions that cause the one ormore data processors to perform operations further comprising:generating display data representing a list of suggested keywords,wherein the order of the displayed list of suggested keywords is basedon the set of contrarian terms and the identified category.
 18. Acomputer readable storage device storing instructions that, whenexecuted by one or more data processors, cause the one or more dataprocessors to perform operations comprising: identifying a plurality ofleaf web pages of a website or a list web page of a website; extractinga set of contrarian terms from the plurality of identified leaf webpages or the identified list web page, wherein extracting the set ofcontrarian terms comprises: extracting a first term and a second termfrom the plurality of identified leaf web pages or the identified listweb page; identifying a category for the first term based on analyzedcontent for the plurality of identified leaf web pages or the identifiedlist web page; identifying the category for the second term based onanalyzed content for the website of the set of websites; and determiningthe first term and the second term are contrarian terms for the set ofcontrarian terms based on the category and a first description of aproduct or service associated with the first term and a seconddescription of a product or service associated with the second term; andstoring the first and second set of contrarian terms and the identifiedcategory in a memory.